GPGPU Register File Management by Hardware Co-operated Register Reallocation
نویسندگان
چکیده
To support massive parallel threads context, GPGPUs use a huge register file. Due to their size, register file is one of the most power hungry logic in GPGPU. However, the current trends indicate that GPGPU register file size will continue to get even bigger as the demand for higher single instruction multiple thread (SIMT) parallelism increases, particularly in high performance application domain. In order to reduce power consumption demand, in this work, we exploit a fundamental observation that fairly high portion of the register spaces are unnecessarily allocated and burn power due to the fact that the registers are considered as private resource for each warp. For example, even though a register’s value is no longer used by any instruction, the register space should be occupied by the warp during the program execution. In GPGPU, as single register usage in a code context leads to thousands of register space allocation, the power and space overhead due to any wasted register is significant. Instead, we propose to share the register file across warps. In our proposed allocation, a register is released from its physical register space immediately after its last use. Then, the released register space is reassigned to another warp’s register. The compile time register lifetime analysis information is used for providing hint to the hardware about each register’s release point. To enable this register reassignment, we propose a light weight register renaming in hardware. By releasing registers and reusing them across warps, we can reduce the demand for register file size on average by 30% compared with the optimally compiled applications. The reduced live register space leads to an average of 23% and 26% static power saving over a basic sub-array level and individual register level power gating.
منابع مشابه
Dynamic Detection of Uniform and Affine Vectors in GPGPU Computations
We present a hardware mechanism which dynamically detects uniform and affine vectors used in Graphics Processing Units, to minimize pressure on the register file and reduce power consumption with minimal architectural modifications. A preliminary experimental analysis conducted with a simulator shows that this optimization can benefit up to 34 % of register file reads and 22 % of the computatio...
متن کاملEfficient Register Assignment through Reallocation
In modern superscalar microarchitectures, access to register file lies on the critical schedule-to-execute path, necessitating smaller files in order to achieve higher clock frequencies. In this paper, we propose innovative register allocation mechanisms to increase the parallelism exploited from smaller register files. Our techniques introduce write-afterread dependencies to facilitate registe...
متن کاملSecurity-aware register placement to hinder malicious hardware updating and improve Trojan detectability
Nowadays, bulk of the designers prefer to outsource some parts of their design and fabrication process to the third-part companies due to the reliability problems, manufacturing cost and time-to-market limitations. In this situation, there are a lot of opportunities for malicious alterations by the off-shore companies. In this paper, we proposed a new placement algorithm that hinders the hardwa...
متن کاملGREENER: A Tool for Improving Energy Efficiency of Register Files
Graphics Processing Units (GPUs) maintain a large register file to increase thread block occupancy, hence to improve the thread level parallelism (TLP). However, register files in the GPU dissipate a significant portion of the total leakage power. Leakage power of the register file can be reduced by putting the registers into low power (SLEEP or OFF) state. However, one challenge in doing so is...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014